Making Publication Metric Tracking Easy

Using a Reproducible, Integrated System of R and Microsoft Power BI® to Ease the Pain of Assessing Publication Metrics

Joshua J. Cook, M.S., ACRP-PM, CCRC

Andrews Research & Education Foundation (AREF)

Biography

Disclosure - I’ve been in data science for about 1 year, and this is was my first data science-related presentation.

  • 2021 - B.S., Biomedical Science, University of West Florida

  • 2023 - M.S., Clinical Research Management, Wake Forest University

  • 2023 - ACRP-PM/CCRC, Association of Clinical Research Professionals (ACRP)

  • 2024 - M.S., Data Science, University of West Florida

  • 2025 - Entry into M.D./Ph.D. program

Publication Metrics

Publications in this talk refer to peer-reviewed literature from academic journals.

Publications are quantified in the form of publication data metrics.

These metrics can include:

  • Publication counts

  • Citation counts

  • Affiliation spread (via journals)

  • Journal impact factor (JIF), or 5-year JIF

Uses

Publication metric signify individual, team, and organization productivity and impact.

Used to make business decisions:

  • Promotions/tenure

  • Awards

  • Grant funding

  • Clinical study sponsorship

Problem

Research managers use this:

With something like this…

The Solution - easyPubMed

easyPubMed

An R package that interfaces with the Entrez Programming Utilities hosted by the National Center for Biotechnology Information (NCBI).

  • Author: Damiano Fantini, Ph.D.
  • Specialized version of the rentrez R package…
  • Two-step process: building queries using PubMed field tags, then retrieving records matching the queries from PubMed

Setup

# Create R project (Rproj), primary folder, working directory

if(!require("tidyverse")) install.packages("tidyverse")
if(!require("easyPubMed")) install.packages("easyPubMed")
if(!require("XML")) install.packages("XML")

library(tidyverse) 
# Data wrangling 
library(easyPubMed) 
# Entrez interface
library(XML) 
# Reading and creating XML docs

1. Understanding the Query

AnzQuery <- "Adam W Anz[AU]" 
# Author field tag (first or any order)

AllAnzQuery <- "Adam W Anz[AU] OR Adam Anz[AU]" 
# Field tag combination with "AND" or "OR" syntax

AnzJournalQuery <- "Adam W Anz[AU] 
AND (American Journal of Sports Medicine[TA] OR Arthroscopy[TA]) "
# Combining field tags - full list in the paper

AnnoyingNameQuery <- "Christopher O\'Grady[AU]"

2. Retrieving Records

AnzQuery <- "Adam W Anz[AU]"
# Previous query

AnzIDs <- get_pubmed_ids(AnzQuery)
# Retrieving query matches

Anz_abstracts <- fetch_pubmed_data(
    pubmed_id_list = AnzIDs, 
format="abstract"
)
# Using PMIDs to download article information (as abstract)

print(Anz_abstracts[1:16])

2. Retrieving Records

 [1] "1. Arthroscopy. 2023 Mar;39(3):728-729. doi: 10.1016/j.arthro.2022.11.030."      
 [2] ""                                                                                
 [3] "Editorial Commentary: Elbow Injury Results When Pediatric and Adolescent "       
 [4] "Throwing Athletes Throw as Hard as Possible, and Weighted Baseball Training "    
 [5] "Should Be Banned for Youth Athletes."                                            
 [6] ""                                                                                
 [7] "Anz AW(1)."                                                                      
 [8] ""                                                                                
 [9] "Author information:"                                                             
[10] "(1)Andrews Research & Education Foundation and Andrews Institute for "           
[11] "Orthopaedics & Sports Medicine."                                                 
[12] ""                                                                                
[13] "Comment on"                                                                      
[14] "    Arthroscopy. 2023 Mar;39(3):719-727."                                        
[15] ""                                                                                
[16] "We are in the middle of an epidemic involving pediatric and adolescent throwing "

2. Retrieving Records

Anz_xml <- fetch_pubmed_data(
  pubmed_id_list = AnzIDs, 
  format="xml"
  )
# Using PMIDs to download article information (XML)

Anz_titles <- custom_grep(
  Anz_xml,
  "ArticleTitle", 
  "char"
  )
# Extracting XML-tagged data (Article Titles)

print(Anz_titles[1:16])

2. Retrieving Records

 [1] "Editorial Commentary: Elbow Injury Results When Pediatric and Adolescent Throwing Athletes Throw as Hard as Possible, and Weighted Baseball Training Should Be Banned for Youth Athletes."                                                         
 [2] "Blood Flow Restriction Using a Pneumatic Tourniquet Is Not Associated With a Cellular Systemic Response."                                                                                                                                          
 [3] "Bone Marrow Aspirate Concentrate Is Equivalent to Platelet-Rich Plasma for the Treatment of Knee Osteoarthritis at 2 Years: A Prospective Randomized Trial."                                                                                       
 [4] "The safety and efficacy of 2 anterior-inferior portals for arthroscopic repair of anterior humeral avulsion of the glenohumeral ligament: cadaveric comparison."                                                                                   
 [5] "Association Between Passive Hip Range of Motion and Pitching Kinematics in High School Baseball Pitchers."                                                                                                                                         
 [6] "Platelet-Rich Plasma: Fundamentals and Clinical Applications."                                                                                                                                                                                     
 [7] "Arthroscopic Subchondral Drilling Followed by Injection of Peripheral Blood Stem Cells and Hyaluronic Acid Showed Improved Outcome Compared to Hyaluronic Acid and Physiotherapy for Massive Knee Chondral Defects: A Randomized Controlled Trial."
 [8] "Biologic Association Annual Summit: 2020 Report."                                                                                                                                                                                                  
 [9] "Elevation of Peripheral Blood CD34+ and Platelet Levels After Exercise With Cooling and Compression."                                                                                                                                              
[10] "Mobilized Peripheral Blood Stem Cells are Pluripotent and Can Be Safely Harvested and Stored for Cartilage&#xa0;Repair."                                                                                                                           
[11] "Blood Flow Restriction Training Using the Delfi System Is Associated With a Cellular Systemic Response."                                                                                                                                           
[12] "Chondral Lesions of the Knee: An Evidence-Based Approach."                                                                                                                                                                                         
[13] "The Effects of Body Mass Index on Softball Pitchers' Hip and Shoulder Range of Motion."                                                                                                                                                            
[14] "Lower Extremity Pain and Pitching Kinematics and Kinetics in Collegiate Softball Pitchers."                                                                                                                                                        
[15] "Bone Marrow Aspirate Concentrate Is Equivalent to PRP for the Treatment of Knee OA at 1 Year: Response."                                                                                                                                           
[16] "Autologous thrombin preparations: Biocompatibility and growth factor release."                                                                                                                                                                     

Retrieving Full Article Data

Anz_download <- batch_pubmed_download(
  pubmed_query_string = AnzQuery,
  format = "xml",
  batch_size = 1000,
  encoding = "UTF8"
)
[1] "PubMed data batch 1 / 1 downloaded..."
# Downloading ALL record information in XML format

Sorting Full Article Data

Anz_list <- articles_to_list(pubmed_data = Anz_download)
# Sorting XML files into a list of article-specific information

Anz_df_list <- lapply(Anz_list, article_to_df, autofill = TRUE)
# Extracting article-specific information from the list
# Stored as a list of tidy dataframes

Anz_full_list <- do.call(rbind, Anz_df_list)
# Unnesting the list into one dataframe

Cleaning and Validation Tips

  • Include as many author aliases as possible inside the query

  • Use a differentiating variable to validate the data (i.e., is this the correct Adam Anz?)

  • Wrangle as needed with the tidyverse

  • Scale up with as many authors as needed for your organization/project

Reporting Options: Quarto® and Microsoft Power BI®

A Basic Quarto® Report

1. Quarto®

An open-source scientific and technical publishing system built into RStudio (i.e., the next generation of R Markdown)

6 Quick Steps:

  1. In RStudio®, create a new Quarto® Document
  2. Edit the YAML header to fit the needs of the report
  3. Create an R code block for setup and data wrangling. The output of this block should not be included in the report
  4. Create additional R code blocks to display tables and figures that tell the story of the data (using packages such as gt and ggplot2).
  5. Add branding and make context using external text, images, and links
  6. Render the document

The Final Product

Power BI

2. Microsoft Power BI®

A proprietary business intelligence (BI) application developed by Microsoft. Three basic options for connecting our data to the Power BI desktop client:

  1. Export the dataframe from R as a static data source (ex: xlsx, csv)

  2. Save the RData file within RStudio and load it from a defined working directory within the Power BI desktop client

  3. Run the entire R script within the Power BI desktop client (no dedicated IDE needed!)

Simple Microsoft Power BI® Report Steps

  1. Transform the data using standard Microsoft Power Query® syntax

  2. From the visualizations tab, edit report page settings to define the canvas size, background, and any other customizations

  3. Choose from available visualizations within Microsoft Power BI® or external visualizations from the visualization store, and other scripting options (such as R or Python)

  4. Once a visualization is chosen, drag fields (data columns) of interest from the data tab to the visualization or filter tabs.

  5. Continue adding visualizations to tell the story of the data

  6. Add branding and make context using external text, images, and links

  7. Publish the final report to the Microsoft Power BI Online Service® to distribute to stakeholders

Conclusions

  • Publication metrics are increasingly being used to measure individual- and organization-level productivity and impact within academia and industry

  • Historically, publication metrics have not been the easiest thing to quantify and manage

  • Instead of manually obtaining this data, it is much more feasible to leverage the R programming language and various reporting systems to manage this data

  • In the future, this system should also capture article citation counts and journal impact factors to add more context to these metrics. Automation of this system (i.e., automated, timed data refreshes) using a third-party program should investigated as well

Contact Information

Joshua J. Cook, M.S., ACRP-PM, CCRC

Cell: (850)736-1801

Email: jcook0312@outlook.com (Email me for the full paper!)